UNITED STATES PATENT APPLICATION 

of 

David Hartwell 

and 

Darrell Donaldson 

for a 

CLOCK FORWARDING DATA RECOVERY 



PATENT 
015311-9822 



\\CHEETAH\VOL1\CLIENTS\015\3 1 l\2289\Prosecut\correct 08/31/00 7:45 PM 



PATENT 
015311-9822 



ENHANCED CLOCK FORWARDING DATA RECOVERY 

INCORPORATION BY REFERENCE OF RELATED 

APPLICATIONS 

This patent application is related to the following co-pending, commonly owned 
U.S. Patent Applications, all of which were filed on even date with the within application 
for United States Patent and are each hereby incorporated by reference in their entirety: 

U.S. Patent Application Ser. No. (1531 1-2281) entitled ADAPTIVE DATA 
PREFETCH PREDICTION ALGORITHM; 

U.S. Patent Application Ser. No. (15311-2282) entitled UNIQUE METHOD OF 
REDUCING LOSSES IN CIRCUITS USING V 2 PWM CONTROL; 

U.S. Patent ApplicatLn Ser. No. (1 53 1 1-2283) entitled 10 SPEED AND 
LENGTH PROGRAMMABLE WITH BUS POPULATION; 

U.S. Patent Applicatick Ser. No. (1531 1-2284) entitled PARTITION 
FORMATION USING MICROPROCESSORS IN A MULTIPROCESSOR 
COMPUTER SYSTEM; 

U.S. Patent Application] Ser. No. (153 1 1-2285) entitled SYSTEM AND 
METHOD FOR USING FUNCTION NUMBERS TO INCREASE THE COUNT OF 
OUTSTANDING SPLIT TRANSACTIONS; 

U.S. Patent Application Ber. No. (1 53 1 1-2286) entitled SYSTEM AND 
METHOD FOR PROVIDING FORWARD PROGRESS AND AVOIDING 
STARVATION AND LIVELOCK IN A MULTIPROCESSOR COMPUTER SYSTEM; 

U.S. Patent Application Ber. No. (1 53 1 1-2287) entitled ONLINE 
ADD/REMOVAL OF SERVERIMANAGEMENT INFRASTRUCTURE; 
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U.S. Patent Application Ser. rffo. (1531 1-2288) entitled AUTOMATED 
BACKPLANE CABLE CONNECTION IDENTIFICATION SYSTEM AND METHOD; 

U.S. Patent Application Se/. No. (1531 1-2289) entitled AUTOMATED 
BACKPLANE CABLE CONNECTION IDENTIFICATION SYSTEM AND METHOD; 

U.S. Patent Application Ser. No. (153 1 1-2290) entitled CLOCK FORWARD 
INITIALIZATION AND RESHT SIGNALING TECHNIQUE; 

U.S. Patent Application/Sen No. (1531 1-2292) entitled PASSIVE RELEASE 
AVOIDANCE TECHNIQUE;/ 

U.S. Patent Application Ser. No. (1531 1-2293) entitled COHERENT 
TRANSLATION LOOK-ASIDE BUFFER; 

U.S. Patent Application Ser. No. (1531 1-2294) entitled DETERMINISTIC 
HARDWARE BEHAVIOR/BETWEEN MULTIPLE ASYNCHRONOUS CLOCK 
DOMAINS THROUGH THE NOVEL USE OF A PLL; and 

U.S. Patent Application Ser. No. (1531 1-2306) entitled VIRTUAL TIME OF 
YEAR CLOCK. 

Field of the Invention 

This invention relates to the transfer of data between different integrated circuits 
(IC's). It is of particular utility in the transfer of data between IC's on different circuit 
boards interconnected by cables of varying lengths. The invention uses a novel clock- 
forwarding arrangement to synchronize operations on the data-receiving units to those on 
the transmitting chips. 



Background Information 

The transfer of data between various units in a data-processing system is normally 
effected by a physical connection between an output latch on the transmitting unit and an 
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input latch on the receiving unit. This requires that the data be clocked into the input 
latch as it is received over the physical connection. At relatively low clock frequencies a 
system-wide clock can be used for this purpose. However, with high clock frequencies, 
corresponding with high data-transfer rates, clock skew, i.e. the difference in clock phase 
in different points in the system, presents a problem. Specifically, the clock edges at 
which the data is clocked into the input latches may not occur at the times the incoming 
data is received, resulting in errors in data reception. 

To overcome this problem various clock-forwarding arrangements have been 

used, with transmitting units sending their clock signals to the receiving units along with 
the data. The clock signals arrive at the input latches of the latter units along with the 
data and the data is therefore clocked into the latches with greatly reduced clock-skew 
error. 

Once received, the data must be moved from the input latch to other components 
that are to process the data in accordance with the function of the receiving unit. These 
components operate in synchronism with a local clock and the transfers from the input 
latch must therefore be effected in such manner as to accommodate the phase differences 
between the forwarded clock and the local clock. One can use a FIFO buffer for this pur- 
pose, the input latch being the input stage of the buffer. The buffer can thus be loaded in 
accordance with the forwarded clock and unloaded in accordance with the local clock. 
However to insure proper operation the buffer must be large enough to contain the largest 
burst of data that will be received by the receiving unit. This results in undue latency in 
each transfer, since incoming data must pass through the successive stages of the buffer 
before it is accessible to the receiving unit. 

As connection lengths between the transmitting node and the receiving node 
change in a clock forwarded system, so does the phase relationship between the for- 
warded clock seen at the receiving node and the local clock at the receiving node. One 
can account for these phase differences by pre-calculating the expected phase differences 
and accounting for these differences in the receive logic by modifying when data is first 
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removed from the FIFO. This has the advantage of reducing latency, but each time the 
connection length is changed, calculations must be made and the operation of the receive 
logic must be modified (which is typically accomplished via register bits that are written 
by an external means) in order to account for the change in length. In cases where con- 
nections between the transmitting node and the receive node are of great length, process 
variations within a standard connection length used may result in skews too great to be 
able to correct. In this instance, larger FIFOs must be again utilized and latencies in- 
crease and bandwidths suffer. 
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SUMMARY OF THE INVENTION 
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effected in step with the receipt of 
ferred from the input latch after it 
data transmission. For example, i 
going edges of the clock signal, it 
going edges. This eliminates the 



The present invention gem rates a phase and edge aligned local clock signal in the 
data-receiving unit by deriving it i rom the forwarded clock signal from the transmitting 
unit.. Transfers from the input late i to other components in the receiving unit can be thus 

data in the input latch. Specifically, data can be trans- 
las been loaded therein and before receipt of the next 
the data is clocked into the input latch on positive- 
can be transferred out of the input latch on negative 
atency incurred with the use of FIFO buffers, while 



insuring the validity of the data p; ssed to other components in the receiving unit. 

The invention is particula] ly applicable to double-data-rate transfers, in which a 
pair of input latches are loaded w th data on alternate transitions of the forwarded clock 
signal. That is, one latch is loade i on positive-going edges and the second on negative- 
going edges. As described beloA* a slight delay is imposed in the transfer of data from 
one of the latches to a third latch that receives the concatenated data of the two input 



25 latches. 
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BRIEF DESCRIPTION OF THE DRAWINGS 
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The invention description below refers to the accompanying drawings which is a 
diagram of a data-receiving unit incorporating the invention. 

DETAILED DESCRIPTION OF AN ILLUSTRATIVE 
5 EMBODIMENT 

- "*TILS JKMf / I n drawing I have illustrated the use of the invention in transmitting data from a 
' central processor unit (CPU) 10 and to a data unit 12. The units 10 and 12 are parts of a 
data processing system, which includes other units that need not be depicted for the pur- 

jnit 12 may be, for example, an I/O Bridge that connects a 
wsses such as PCI, PCI-X, AGP . It may reside on a sepa- 
10, in which case data transmissions from the CPU to the 
unit 12 pass over a cable 14. Tme cable 14 includes a set of data conductors 16 and clock 
conductors 17 and 18. The conductors 17 and 18 carry a clock signal; the versions on the 
two conductors being of opposite phase. The CPU 10 transmits data over the data con- 
ductors 16 in synchronism with t tie clock signal. Specifically, whenever a transition in a 
clock signal is transmitted from t le CPU 10, corresponding data is transmitted over the 
conductors 16. The incoming signals pass through receivers 19. 

The illustrated system provides for double data-rate transmissions. That is, the 
CPU 10 transmits data in synchro lism with both the positive-going and negative-going 
transistors of the clock signal. Ho wever the invention is applicable also to systems in 
which the transmissions are synchronized with only the positive-or negative-going tran- 



poses of this description. The 
processor to industry standard 
rate circuit board from the CPl 
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sitions. 



For data reception fro n the CPU 10, the receiving unit 12 includes a pair of input 



latches 20 and 22 that receive 
and 26 that concatenate the d; 



the data transmitted over the cable 14, a pair of latches 24 
25 ana Zb that concatenate tne ddja received by the latches 20 and 22, and a phase-lock loop 
PLL 28 that generates a local dlock signal for the unit 12. The latches 20 and 22 have 
data input terminals 20d and 22d that receive the data transmitted over the cable conduc- 
tor 16. The clock input termina s 20c and 22c receive delayed versions of the clock sig- 
nal from delay elements 30 and 32. The elements 30 and 32 preferably provide a delay of 
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90° to issue to insure that the latches 20 and 22 are clocked after the data voltages have 
settled. A delay element 34 i s interposed in the data input to compensate for delay of the 
clock signal cause by the loa I imposed by the inputs to which the latter signal is deliv- 
ered. 

The date output of the input latch 20 is applied directly to the data input terminal 
24d of the later 24. 

The data output of the input latch 22 is applied to the data input terminal 26d of the 
latch 26 by way of a delay element 38 whose function is described below. 
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The phase lock loop 28 receives as a reference input the forwarded clock signal 
CLK from the CPU 10. The other input of the PLL 28 is provided by the output of the 
PLL, delayed by a delay element 42. The electrical lengths of clock lines 44, 46 and 48 
are equal to each other. The output of the PLL is a local clock signal for the various 
components of the receiving unit 12 other than the input latches 20 and 22. The delay 
element 42 replicates the delay between the PLL 28 and the components that receive the 
local clock signals. These components are thus clocked in synchronization and in phase 
with the forwarded clock signal as received at the input latches 20 and 22. 

The latch 22 is triggered by the positive-going edges of the incoming clock signal 
CLK and the latch 20 is triggered by the negative-going edges of the same signal. Thus 
the latches 20 and 22 are loaded on alternate clock edges. The latches 24 and 26 are both 
loaded on the positive-going edges of the local clock. 

The delay element 38 is inserted between the output of the input latch 22 and the 
data input terminal of the latch 26 to deal with the effects of clock jitter, i.e. short term 
variations in the phase of the local c lock relative to that of the forwarded clock signal ap- 
plied to the latches 22. Both of the latches 22 and 26 respond to positive-going CLK 
edges. If a positive edge of the loa 1 clock arrives at the clock input terminal 26 c 
slightly before the arrival of the pos itive edge of the forwarded clock signal at the termi- 
nal 22c, valid data from the input latch 22 will be transferred to the latch 2. However, if 
the local clock edge arrives at the terminal 26c subsequent to the arrival of the corre- 
sponding edge of the terminal 22cl the contents of the latch 22 may change before they 
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are be transferred to the latch 26. The delay unit 38 delays the arrival of the change in 
latch 22 contents at the data input terminal 26d so as to insure that even a slightly late 
clock edge at the terminal 26c will load the correct data into the latch 26. 

There is no need for a corresponding delay in the data input of the latch 24, since 
it is clocked a half clock cycle before the next change in the contents of the input latch 



As an example of the delay provided by the delay unit 38, we have used a delay of 
400 pico seconds with a clock frequency of 200 megahertz. 



20. 
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